Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
当有足够的训练数据时,在某些视力任务中,基于变压器的模型(例如Vision Transformer(VIT))可以超越跨趋化神经网络(CNN)。然而,(CNN)对视力任务(即翻译均衡和局部性)具有强大而有用的归纳偏见。在这项工作中,我们开发了一种新颖的模型架构,我们称之为移动鱼类地标检测网络(MFLD-NET)。我们已经使用基于VIT的卷积操作(即斑块嵌入,多层感知器)制作了该模型。 MFLD-NET可以在轻巧的同时获得竞争性或更好的结果,同时轻巧,因此适用于嵌入式和移动设备。此外,我们表明MFLD-NET可以在PAR上获得关键点(地标)估计精度,甚至比FISH图像数据集上的某些最先进的(CNN)更好。此外,与VIT不同,MFLD-NET不需要预训练的模型,并且在小型数据集中训练时可以很好地概括。我们提供定量和定性的结果,以证明该模型的概括能力。这项工作将为未来开发移动但高效的鱼类监测系统和设备的努力奠定基础。
translated by 谷歌翻译
在本文中,我们将预处理技术应用于具有不同长度的多通道时间序列数据,我们称之为对齐问题,用于下游机器学习。多种原因可能发生多种渠道时间序列数据的未对准,原因有多种原因,例如丢失的数据,变化的采样率或不一致的收集时间。我们考虑从MIT SuperCloud高性能计算(HPC)中心收集的多渠道时间序列数据,其中不同的工作开始时间和HPC作业的运行时间不同,导致数据不对准。这种未对准使得为计算工作负载分类等任务构建AI/ML方法具有挑战性。在先前使用MIT SuperCloud数据集的监督分类工作的基础上,我们通过三种宽阔的低间接空间方法解决了对齐问题:从全职系列中抽样固定子集,在全职系列上执行摘要统计信息,并对系数进行取样。从映射到频域的时间序列。我们最佳性能模型的分类精度大于95%,以先前的方法对MIT SuperCloud数据集的多通道时间序列分类的表现优于5%。这些结果表明,我们的低间接费用方法与标准机器学习技术结合使用,能够达到高水平的分类准确性,并作为解决对齐问题(例如内核方法)的未来方法的基准。
translated by 谷歌翻译
我们展示了一种物理感知的变压器,用于从具有不同分辨率,颜色空间,焦距,焦距和暴露的相机的基于特征的数据融合。我们还展示了使用开源计算机图形软件为变压器合成训练数据生成的可扩展解决方案。我们演示了具有不同光谱响应,瞬时视野和框架速率的阵列上的图像合成。
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
我们仔细比较了两种无模型控制算法,演进策略和近端政策优化(PPO),具有后退地平线模型预测控制(MPC),用于操作模拟,价格响应式热水器。考虑了四个MPC变体:单次控制器,具有完美预测产生最佳控制;一个有限的地平控制器,具有完美预测;基于平均的预测控制器;使用历史情景,一个两阶段随机编程控制器。在所有情况下,水温和电价的MPC模型精确;只有水需求不确定。为了比较,ES和PPO通过在MPC使用的相同场景下直接与模拟环境直接交互来学习基于神经网络的策略。然后在需求时间序列的单独一周继续的单独一周内进行评估所有方法。我们证明了对这个问题的最佳控制是具有挑战性的,需要超过8小时的MPC寻找,具有完美预测来获得最低成本。尽管存在这一挑战,但ES和PPO都学会了在平均成本方面优于平均预测和两级随机MPC控制器的良好通用政策,并且在计算动作时速度越来越多的数量级。我们表明ES尤其可以利用并行性,使用1150 CPU核心在90秒内学习策略。
translated by 谷歌翻译
通过将微分方程(DES)和强化学习(RL)与域知识相结合,我们模拟阿尔茨海默病的疾病(AD)进展。 DES提供与广告相关的一些但不是全部因素之间的关系。我们假设缺失的关系必须满足关于大脑的工作的一般标准,例如,最大限度地提高认知,同时最小化支持认知的成本。这允许我们通过使用RL来优化捕获捕获上述标准的目标(奖励)函数来提取缺失的关系。我们使用由DES(作为模拟器)和训练的RL代理组成的模型,以预测合成和实际数据的基线(第0年)特征的个性化10年的广告进展。该模型可比较或更好地预测10年的认知轨迹,而不是最先进的基于学习的模型。我们的可解释模型展示,并提供了缓解广告效果的“恢复/补偿”过程的见解,即使这些过程在模型中未明确编码。我们的框架将DES与RL结合起来,以进行广告进展,并具有广泛适用性,以了解其他神经系统疾病。
translated by 谷歌翻译
Large, labeled datasets have driven deep learning methods to achieve expert-level performance on a variety of medical imaging tasks. We present CheXpert, a large dataset that contains 224,316 chest radiographs of 65,240 patients. We design a labeler to automatically detect the presence of 14 observations in radiology reports, capturing uncertainties inherent in radiograph interpretation. We investigate different approaches to using the uncertainty labels for training convolutional neural networks that output the probability of these observations given the available frontal and lateral radiographs. On a validation set of 200 chest radiographic studies which were manually annotated by 3 board-certified radiologists, we find that different uncertainty approaches are useful for different pathologies. We then evaluate our best model on a test set composed of 500 chest radiographic studies annotated by a consensus of 5 board-certified radiologists, and compare the performance of our model to that of 3 additional radiologists in the detection of 5 selected pathologies. On Cardiomegaly, Edema, and Pleural Effusion, the model ROC and PR curves lie above all 3 radiologist operating points. We release the dataset to the public as a standard benchmark to evaluate performance of chest radiograph interpretation models. 1
translated by 谷歌翻译
In this paper, we propose a novel technique, namely INVALIDATOR, to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning. INVALIDATOR reasons about program semantic via program invariants while it also captures program syntax via language semantic learned from large code corpus using the pre-trained language model. Given a buggy program and the developer-patched program, INVALIDATOR infers likely invariants on both programs. Then, INVALIDATOR determines that a APR-generated patch overfits if: (1) it violates correct specifications or (2) maintains errors behaviors of the original buggy program. In case our approach fails to determine an overfitting patch based on invariants, INVALIDATOR utilizes a trained model from labeled patches to assess patch correctness based on program syntax. The benefit of INVALIDATOR is three-fold. First, INVALIDATOR is able to leverage both semantic and syntactic reasoning to enhance its discriminant capability. Second, INVALIDATOR does not require new test cases to be generated but instead only relies on the current test suite and uses invariant inference to generalize the behaviors of a program. Third, INVALIDATOR is fully automated. We have conducted our experiments on a dataset of 885 patches generated on real-world programs in Defects4J. Experiment results show that INVALIDATOR correctly classified 79% overfitting patches, accounting for 23% more overfitting patches being detected by the best baseline. INVALIDATOR also substantially outperforms the best baselines by 14% and 19% in terms of Accuracy and F-Measure, respectively.
translated by 谷歌翻译
The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.
translated by 谷歌翻译